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We introduce penalty-function-based admission control policies 
to approximately maximize the expected reward rate in a loss net¬ 
work. These control policies are easy to implement and perform well 
both in the transient period as well as in steady state. A major advan¬ 
tage of the penalty approach is that it avoids solving the associated 
dynamic program. However, a disadvantage of this approach is that 
it requires the capacity requested by individual requests to be suf¬ 
ficiently small compared to total available capacity. We first solve 
a related deterministic linear program (LP) and then translate an 
optimal solution of the LP into an admission control policy for the 
loss network via an exponential penalty function. We show that the 
penalty policy is a target-tracking policy—it performs well because 
the optimal solution of the LP is a good target. We demonstrate that 
the penalty approach can be extended to track arbitrarily defined 
target sets. Results from preliminary simulation studies are included. 

1. Introduction. We consider the following dynamic stochastic alloca¬ 
tion problem (details in Section 2). The stochastic system consists of a net¬ 
work of resources (facilities), each with a known fixed capacity. Requests 
for using this network belong to a diverse set of request classes, differing 
in the arrival rate, the service duration, the resource requirements and the 
willingness to pay. There is no waiting room (queue), therefore an arriving 
request must be either admitted into the system for service and assigned 
an appropriate resource allocation or rejected (lost) at the instant it ar¬ 
rives. An admitted request occupies the allocated resources for the service 
duration and releases all the resources simultaneously. The objective of the 
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system controller is to design an admission control policy that optimizes 
an appropriate performance measure of the revenue generated. 

The stochastic model detailed above is known as a loss network. Loss 
networks model a wide variety of applications where a diverse user popula¬ 
tion shares a limited collection of resources, for example, telephone net¬ 
works, local area networks, multiprocessor interconnection architectures, 
data base structures, mobile radio and broadband packet networks [see Ott 
and Krishnan (1992), Hui (1990), Kelly (1985), Lagarias, Odlyzko and Zagier 
(1985), Mitra and Weinberger (1987) and Mitra, Morrison and Ramakrish- 
nan (1996), for details]. Kelly (1991) gave an excellent review of results for 
loss networks. For a discussion of a related model with loss queues in series, 
see Ku and Jordan (1997). 

A loss network with a single resource is known as a stochastic knap¬ 
sack [Ross and Tsang (1989b)]. Optimality results have been obtained for 
several restricted classes of admissible policies: complete partitioning poli¬ 
cies [Ross and Tsang (1989b)], coordinate convex policies [Foschini and 
Gopinath (1983), Ross and Tsang (1989b) and Jordan and Varaiya (1994)] 
and restricted complete sharing policy [Gavois and Rosberg (1994)]. Ross 
and Yao (1990) discussed monotonicity properties for the stochastic knap¬ 
sack. See Ross (1995) for a summary of these results. 

When capacity requests and service durations of all the request classes 
are identical, the optimal policy for the stochastic knapsack problem has 
the following simple form: Accept class i requests if there are at least 5i 
units of capacity free. Such a policy is called a trunk reservation policy and 
the parameters 6i are called trunk reservation parameters. This result was 
established by Miller (1969) [see also Lippman and Ross (1971)]. Several 
approaches to compute (approximately) optimal trunk reservation param¬ 
eters 6i were discussed by Key (1990), Bean, Gibbens and Zachary (1995) 
and Reiman and Schwartz (2001). Trunk reservation policies are not optimal 
when the capacity request or service duration is class dependent [Ross and 
Tsang (1989a)] nor are they optimal for networks [Key (1990)]. The asymp¬ 
totic optimality of trunk reservation policies under a limiting regime where 
the arrival rates and capacity increase together, the Halfin-Whitt regime 
[Halfin and Whitt (1981)], was established by Hunt and Laws (1993, 1997). 
For asymptotic optimality results under different limiting regimes, see Kelly 
(1991), Hunt and Kurtz (1994) and Key (1994). 

The optimal capacity allocation problem has also been extensively studied 
in the revenue management literature. For a recent overview, see McGill and 
van Ryzin (1999). Unlike the model introduced here, capacity allocation 
models in the revenue management literature typically assume that there 
is a hnite time horizon over which the capacity must be allocated and that 
capacity once allocated never becomes available again. Our model is closer 
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to that developed by Savin, Cohen, Cans and Katalan (2000) in the context 
of the rental industry. 

In all previous works on related stochastic allocation models, the asso¬ 
ciated optimization problem is formulated as a dynamic program (DP), 
and the optimal policy is the solution of the associated Bellman equation. 
However, solving the Bellman equation quickly becomes computationally 
intractable and is, in many cases, EXP-complete [Papadimitriou and Tsit- 
siklis (1999) and Blondel and Tsitsiklis (2000)]. In practice, therefore, the 
DP formulation is only used to characterize certain qualitative structural 
properties of the optimal policy, which then form the basis for heuristic ap¬ 
proaches for solving the problem. Optimal DP policies are very sensitive to 
the time horizon of the problem. Due to end-effects, the optimal DP poli¬ 
cies that correspond to different time horizons are usually not compatible. 
Also, there is no guarantee that steady state optimal policies [e.g., the in¬ 
dependent thinning policy; Kelly (1991)], will perform well in the transient 
period. 

In this article, we explore alternative simpler techniques for characteriz¬ 
ing approximately optimal policies. We replace the stochastic optimization 
problem by a suitably constructed linear program (LP). The optimal solu¬ 
tion of this LP yields a target point that is translated into an admission 
control policy using an exponential penalty function. We show that this pol¬ 
icy is approximately optimal in the limit where individual resource requests 
are small compared to the total capacity [Halfin and Whitt (1981)]. More¬ 
over, we show that this penalty policy performs well in the transient period 
as well. 

Our penalty-based approach builds on several disparate research ideas: 
convex programming bounds for stochastic problems [Gibbens and Kelly 
(1995), Bertsimas, Paschalidis and Tsitsiklis (1994), Bertsimas and Nino 
Mora (1999a, b) and Bertsimas and Chryssikou (1999)], asymptotically op¬ 
timal policies for control and scheduling problems via “fluid” relaxations 
[Maglaras (2000), Bertsimas and Sethuraman (2002) and Bertsimas, Sethu- 
raman and Gamarnik (2003)] and exponential penalty-based approximation 
algorithms for linear programming [Shahronki and Matula (1990), Plotkin, 
Shmoys and Tardos (1991) and Bienstock (2002)]. Exponential penalty func¬ 
tions have also proved useful for admission control and load balancing in 
an adversarial setting [Aspnes, Azar, Plotkin and Waarts (1997), Azar, 
Kalyanasundaram, Plotkin, Pruhs and Waarts (1997) and Kamath, Pal- 
mon and Plotkin (1998)]. Of this, Kamath, Palmon and Plotkin (1998) is 
the most relevant to the discussion here. 

The summary of our contributions in this article is as follows: 

(i) We develop explicit upper bounds for the maximum achievable rev¬ 
enue rate for any time f > 0. This extends the analysis in Gibbens and Kelly 
(1995). 
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(ii) We construct an exponential penalty-based admission control policy 
that is provably approximately optimal for all times t > 0 in the Halfin- 
Whitt limiting regime [Halfin and Whitt (1981)]. The policy is a simple 
threshold-type policy in an expanded state space. Preliminary simulation 
studies (see Section 3.4) suggest that the state space expansion is the key 
to the success of the penalty policy. 

(iii) We demonstrate that our approach can be extended to track arbi¬ 
trary polyhedral target sets. 

The organization of this article is as follows. In Section 2 we formulate 
the admission control problem for a loss network. The framework is Marko¬ 
vian, that is, the arrivals are Poisson and service times are exponentially 
distributed. In Section 3 we study the single resource model and its various 
variants. Section 3.4 contains simulation results for this special case and 
Section 3.5 extends some of the results to the case of general service time 
distributions. In Section 4 we extend the single-resource results to the net¬ 
work problem. Section 5 presents an extension to control problems where 
the objective is to ensure that the state of the network lies in a specified 
target set. Section 6 has some concluding comments and discussion. 

2. Admission control in loss networks. The stochastic system under con¬ 
sideration consists of a network of s resources (facilities) with capacity 
b G R^, where b{k) > 0 is the capacity of resource k = 1,..., s. Requests for 
using this network belong to m independent Poisson arrival classes. Class i 
requests have an arrival rate Aj and a service duration Si ~ exp(/ii); that is, 
Si is exponentially distributed with rate fn (with the exception of Section 
3.5). Class i requests are willing to accept any capacity allocation from the 
set Bi = {bji,..., bj;.}, hij G R^, and pay per unit time for the (random) 
service duration Si. There is no waiting room in the system; therefore, each 
arriving class i request must either be accepted and admitted into the sys¬ 
tem (i.e., assigned an admissible capacity allocation bjj G Bi) or be rejected 
at the instant it arrives. When an accepted request departs after service 
completion, it releases all the allocated resources simultaneously. 

We assume that the system is initially empty, that is, x(0“) = 0 (see 
Remark 1 in Section 3.1 for a discussion on nonzero initial states). Let 
Xij{t) denote the number of class i requests currently in the system that are 
assigned to the allocation hij G Bi. Define Xj(f) = (xii(f),... ^Xii^{t)) G Z^_f. 
and x(f) = (xi(f),... ,Xm(t)) G Z()_, where I = h- A request of class i 
can be assigned a capacity allocation hij only if there is sufficient capacity 
to accommodate it, that is, 

m h' 

EE Xi'ji(t)hiij! -|- hij < b, 

i'=l j'=l 


(1) 
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where the inequality is interpreted component by component. The system 
controller is permitted to reject requests even if there is sufficient capacity to 
accommodate them. The instantaneous reward rate R{t) at time t is given 
by 


( 2 ) 


m / li \ m 

i=l \j=l ) i=l 


This stochastic model is called a loss network [Kelly (1991)]. 

Let i = 1,... ,m, n > 1, denote the arrival epoch of the nth class i 

request. Since all admission decisions are made at arrival epochs, a feasible 
admission control policy tt is described as follows: 

(a) A policy tt is a collection of random variables tt = {7r(j i = 1,..., m, n > 1}, 
with e {0,1,..., li}, where = 0 denotes that class i request arriv¬ 
ing at the epoch T^i^n) is rejected and T^(i^n) = j (^1) denotes that the 
request is assigned to bjj G Bi. 

(b) The random variable is measurable with respect to the a-algebra 
generated by the past arrival epochs {T(^p^q) -P = ^, ■ ■ ■ ,m, q > 1, T(^p^q) < 

T(i,n)}, the past actions {-Jr^p^q) :p = I,... ,m,q > l,T(^p,q) < T(^i,n)} and the 
state process {x^(t) : t < T(j where the notation emphasizes that the 
state process is itself a function of past actions. 

(c) The state process {x’^(t) :t > 0} does not violate capacity constraints, 

that is, Yl'iLi ^ b for all t > 0. (Rejection is the only feasible 

action when adequate capacity is not available.) 

Let {t) = ^i(l^x)^(^)) denote the instantaneous reward rate of 

the policy tt at time t. The objective of the controller is to choose a feasible 
policy TT that maximizes some performance measure on the reward rate pro¬ 
cess {K^{t) :t > 0}. Appropriate performance measures for finite time hori¬ 
zon problems are either expected total reward E[/q R'^{s)ds] or expected 
discounted reward E[/(^ e~^^R'^{s)ds], /3 > 0; for the infinite time horizon 
problems, the appropriate measures are either expected discounted reward 
E[/q°° e~^^K^{s) ds], f5> 0, or long-run average reward limT^oo R^{s) ds]. 

As mentioned in Section 1, our goal is to construct feasible policies that 
perform well both in the transient period as well as in steady state. We 
first establish an upper bound R*{t) on the achievable expected reward rate 
E[i?’^(t)] and then construct a feasible policy tt with expected reward rate 
E[.R(t)] KiR*[t). Thus, the policy tt satisfies 


^ r e-^^R^{s)ds] < r e-f^^R*{s)ds^E \ T e-^^R{s)ds 
Jo . Jo Jo 


E 


/ 9 > 0 , 
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that is, the policy tt is approximately optimal for any finite time horizon, 
and 


lim —E 

>c» T 


R^{s) ds 


< lim — 

- T^oo T . 


^ R*{s)ds^ lim 

D I —^OC 1 


R{s) ds 


that is, the policy tt is approximately optimal in the steady state as well. 


3. Single-resource model. This section focuses on the loss network with 
s = 1 (i.e., the stochastic knapsack). The details of the single-resource model 
are as follows. The system is assumed to be initially empty [i.e. x(0“) = 0]. 
Requests belong to m Poisson arrival classes. Request class i has arrival 
rate A*, capacity request bi (without loss of generality, one can assume that 
the set Bi is a singleton), service duration Si ~exp(/rj), and reward rate r* 
per unit time. All the requests arrive at a common resource with capacity 
b G (0, oo). There is no waiting space (queue); therefore, each arriving request 
must either be admitted into service or rejected at the instant it arrives 
[see Cosyn and Sigman (2004) and Cosyn (2003) for extensions to queues]. 
Requests may be rejected even if there was adequate capacity available. 

Note that if the total capacity b is an integer and bi = l, 1 <i <m, then 
b can be identified as the number of servers in a standard queuing model. 
In particular, if requests are always served when capacity exists, then this is 
simply an M/M/b loss queue. Thus, it helps to imagine that each accepted 
request has its own server. In this light, the loss network introduced in 
Section 2 can be viewed as a collection of such server models, all working 
together in parallel. 

The layout of this section is as follows. In Section 3.1 we develop an 
upper bound on the achievable reward rate. In Section 3.2 we construct 
an approximately optimal penalty-based policy. Section 3.3 investigates the 
penalty policy in the Halfin-Whitt limiting regime [Halfin and Whitt (1981)] . 
In Section 3.4 we simulate the transient behavior of the proposed control 
policy and compare its performance to thinning policies introduced by Kelly 
(1991). Section 3.5 discusses the extension to general service times. 

3.1. Upper bound on the achievable reward rate. Let tt denote any feasi¬ 
ble control policy for the single-resource model. Let xj{t) denote the num¬ 
ber of the class i requests in service at time t. Since feasibility implies that 
hxj (t) < b, we have 

m 

(3) 5^6,E[<(t)]<6. 

Moreover, E[xf (t)] < E[gj(t)], where qi{t) is the number of class i requests as 
time t in an infinite capacity system with no admission control. Recall that 
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we assume that the system is initially empty, therefore [see, e.g., page 75 in 
Wolff (1989)], E[( 7 j(t)] = pi{\ — exp(—/ijt)). Hence, 

\pl pm / 

is feasible for the linear program 

n 

maximize E 

i=l 

( 4 ) ^ 

subject to biPiai < b, 

i=l 

0 < CKj < 1 — exp(—/ijt), i = 


Let cx*{t) denote an optimal solution and let R*{t) denote the optimal value 
of (4). Then 


( 5 ) 


/ 1 


In the next section we propose a policy that controls the system by penal¬ 
izing deviations from a desired target state. From (4) and (5), it follows that 
for a policy tt to be approximately optimal, the expected number E[x7(t)] 
of accepted class i requests must be approximately x*{t) = a*{t)pi. Thus, 
x*(t) = would be the natural target state for the penalty 

policy. Unfortunately we are only able to establish that a penalty policy 
can successfully track a fixed target. The natural fixed target is x^ = a*pi, 
i = 1,... ,m, where a* = (ai,..., am)'^ is an optimal solution of the “steady 
state” analog of (4): 

n 

maximize E 

i=l 

( 6 ) ^ 

subject to 2_^ biPiOi < b, 

i=l 

0 < a* < 1, i = 1,... ,m. 


Let R* denote the optimal value of (6). Next, we bound R*{t) in terms of 
the steady state quantities a*, R* and the problem parameters. Since a 
feasible for (4) must satisfy ctj < 1 — i = 1,... ,m, it follows that 

m 

R*{t) <'^ripi{l - exp{-pit)). 

i=l 


( 7 ) 
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The linear programming dual of (4) is 

m 

minimize ub + E Vi{l - exp(-//if)) 

i=l 

(^) subject to Vi + biPiU > ViPi, i = 1,... ,m, 

V > 0, tt > 0. 

Taking the limit t ^ oo in (8) we get the dual of the steady state LP (6): 
minimize ub + l^v 

(9) subject to Vi + biPiU > ViPi, i = 1,... ,m, 

V > 0, n > 0. 

Let (tt*, V*) denote any optimal solution of (9), U = {i: a* = 1} and = 
{i:i ^ U}. Then it follows that 

m 

(10) R*{t) < u*b + n*(l — exp{—pit)) 

i=l 
m 

(11) = ^ ripio* - ^ < exp(-/iit) 

i=l i£U 

m 

(12) = E - E(^*^» “ biPiU*)a* exp(-pjt) 

i=i ieu 

m / m 

(13) = ^ riPia*{l - exp(-pjt)) + u*iY^ biPiO* exp(-pjt) 

i=l \i=l 

m 

(14) <^ripiai{l-eyip{-pit)) + u*bexp{-praint), 

i=l 

where (10) is implied by the fact that (u*, v*) is feasible for the dual LP (8); 
(11)-(13) all follow from complementary slackness conditions [Luenberger 
(1984)]; and pmin = nhni<i<m{pi}- From (7) and (14) we have the following 
result. 


Theorem 1. The reward rate K^{t) of any feasible policy n satisfies 
E[R^{t)]<R*{t) 


(15) 


< min< riPi{l - exp(-/Xjt)), 

U=i 

m 

YnPiOCiC^ - exp(-pit)) + n*6exp(-pmint) 


2=1 
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where R*{t) is the optimal value of the LP (4), ct* is an optimal solution 
of the steady state LP ( 6 ) and {u*,v*) is an optimal solution of the steady 
state dual LP (9). 

The first term in the upper bound on R*{t) is active for t < 1//Xmax, where 
Aimax = niaxi<j<m{/ii}, whereas the second is active for t > 1 /Aimin- 

Remark 1. Although we assume that the system is initially empty, all 
the results in this article extend to the case where the initial state x( 0 “) ^ 0. 
For example, when x(0“) = x® 7 ^ 0, the bound analogous to (15) is given by 

{ m m 

-exp(-Arjt)) + ^riX°exp(-Aiit), 
i=i i=i 

m 

Y2ripia*{l - exp(-/rit)) + u* 6 exp(-Aimini) 
i=l 

+ f:^exp(-Ar.t) 

The results in this section bear close resemblance to the notion of fluid 
operating points introduced by Harrison (2003). However, unlike the devel¬ 
opment here, Harrison employed the fluid model only to define a nominal 
operating point—the control policy is designed using a heavy-traffic limit 
associated with this operating point. 

3.2. Exponential penalty function and penalty control policy. Kelly (1991) 
established that, under fairly general conditions, an independent thinning 
policy that accepts each incoming class i request with probability a*, pro¬ 
vided there is enough capacity, approximately optimizes the expected re¬ 
ward rate in steady state. However, for small t, thinning underutilizes the 
capacity and, therefore, the expected reward rate of the thinning policy is 
significantly smaller than the upper bound (7). Moreover, since thinning 
only changes the effective arrival rate, it is not able to effectively control the 
variance of the reward rate. Our goal is to construct a policy that does not 
suffer from these drawbacks. We will first informally motivate the structure 
of the policy and then establish its properties rigorously. 

Consider the following modification to the original system. Suppose each 
rejected class i request, instead of immediately leaving the system, is as¬ 
signed to an alternate infinite capacity server where it lives out its service 
time and then leaves. [In practice, each time a request is rejected the policy 
will add one request to the alternate server with a service time Si ~ exp(Ati).] 
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From the analysis leading to the LP (4), it follows that for the expected 
reward rate E[i?(t)] to be close to the bound (15), one requires E[xj(t)] Ri 
x*{t) = ai{t)pi, i = 1,... ,m. Let yi{t) denote the number of class i requests 
in the alternate server at time t. Then E[3:j(t)] + E[?/j(t)] = E[gj(t)] = pi{l — 
exp{—pit)). Thus, an equivalent condition for optimality is that E[yi(t)] 
y*{t) = pi{l — ex.p{—pit) — ai{t)). Let be a penalty function that 

penalizes deviations from the desired target state {x*(t), y*(t)). Since keep¬ 
ing {xi,yi) Ri (x*(t), y*(t)) is equivalent to minimizing the penalty function, 
a control policy that accepts a request, provided there is adequate capacity 
and '^iixi -|- l,yi) < '^i{xi,yi + 1), may be close to optimal. Such a policy 
can be thought of as iteratively solving the nonlinear optimization problem 
m.mx^y^i{x,y) with the added restriction that it can take a step only when 
there is an arrival and the step length is restricted to 1. Moreover, periodi¬ 
cally the state {xi,yi) gets perturbed in a uncontrollable manner by requests 
leaving the system. From related results in the nonlinear optimization lit¬ 
erature [see, e.g., Luenberger (1984)], it follows that such a penalty-based 
control policy is likely to be successful provided the gradient of the penalty 
'kj is sufficiently “large” around the target state (x*,y|), the step length 
of 1 is a “small” step in an appropriately defined norm and the frequency 
of correcting steps is sufficiently higher than the frequency of the perturb¬ 
ing steps (i.e., pi = Xi/pi ^ 1). The relationship of penalty function and 
nonlinear optimization is further discussed in Section 6. 

In this article, we use a penalty function of the form 

4..te.!,.)=exp(,3^)+exp(,3||). 

This choice is motivated by the fact that the exponential function is an 
eigenfunction of the underlying Markov process and that, for this choice, 
moment generating functions can be used to characterize the behavior of the 
penalty policy. Note that although the penalty method can be formulated 
without any reference to the rejected requests y*, the form that we propose 
does not permit us to do so. In our penalty function we need yi to ensure 
that the number of accepted requests Xi does not drop too low. In the rest 
of this section, we rigorously establish these informal ideas. 

Since we are interested in approximating the upper bound (15), we drop 
from consideration all those classes with a* = 0. As proposed above, we 
add a fictitious infinite capacity system. We will refer to the original sys¬ 
tem as system 0 and the fictitious system as system I. The state of the 
augmented network at time t is s(t) = (x(t),y(t)) G The state vec¬ 

tor x(t) = {xi{t),... ,Xm{t)), where Xi{t) is the number of class i requests 
in system 0 at time t, describes the state of system 0. Similarly, y{t) = 
(yi(t),... ,ym{t)) describes the state of the fictitious system I at time t. 
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(16) 


The state s = (x,y) is assigned a penalty T(s) given by 

biXi 


^(s) = E 


i=l 


exp j3 


+ exp( /3- — 


’l'i(Si) 


where (c°,c^) G and Sj = {xi,yi) denotes the components of s that cor¬ 
respond to class i. There are two competing requirements on the multiplier 
/3: we need (5 to be large to ensure that the penalty function i&(s) is suffi¬ 
ciently steep; on the other hand, we also have to ensure that the impact of 
a single arrival or departure on the penalty value is sufficiently small. The 
precise bound on (3 is given by (22). The capacities (c°,c^) determine the 
“steady-state” target state of the penalty policy. As mentioned previously, 
we choose a fixed target because we are unable to establish that penalty poli¬ 
cies can track time-varying targets. The transient performance is controlled 
by suitably initializing the fictitious system 1. 

The penalty policy tt is defined as follows. Let {s{t) = (x(t),y(t)) > 0} 

denote the state process under the control tt. At time t = 0~, the state of 
the original system x(0“) = 0 , the state of the fictitious infinite capacity 
system 1 is initialized to y(0“) [the precise value of y(0“) is specified later] 
and a service time Si ~ exp(/ij) is generated for each of the yi{0~) class i 
requests in system 1, i = 1,..., m. 

At time t > 0, an arriving class i request is accepted by the control policy 
TT (i.e., routed to system 0) provided 

... d^i{si{t)) ^ d^i{si{t)) 

^ ’ dxi - dvi 

and the capacity constraint on system 0 is not violated, that is, 

m 

( 18 ) XI + 

i'=i 


otherwise it is rejected (i.e., routed to system 1) and the policy tt attaches 
to it a service time Si ~ exp(/rj) independent of everything else. Since the 
admission condition (17) is equivalent to 


(19) 


Xi{t) ^ yi{t) 

^0 - A 


+ 




log 


it is clear that the policy tt is a threshold-type policy in the expanded state 
space s = (x, y) G 

The capacities (c®,c^), the parameter /3 and the initial state y(0“) are de¬ 
fined in terms of a perturbation parameter e G (0, j). Define an e-perturbation 
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of the steady state LP (6) as 

m 

maximize E 

i=l 

( 20 ) - 5 

subject to 2^ hipitti < 
i=l 

0 < a* < 1, i = 1,..., m. 


Let OL^ denote an optimal solution of this perturbed LP (20). Then the 
capacities ( 0 *^, 0 ^) are given by 


(21) c° = (1+ 4e)af6iPi, c- = (1 + 4e)(l - af)6iPi, i = l,. 
and (3 must satisfy 

(22) /3<emin| min 1^1, min 

(23) =e(l + 4e)mini min {afpi} min {(1 - af)/9i} I, 




where LJ = {i: af < 1, z = 1,..., m}. The bound (22) formalizes the notion 
that the change in the penalty value associated with a single arrival or 
departure must be small [the bounds (22) and (23) are identical]. Since 
parameter f3 must be sufficiently large for the penalty policy to perform well, 
the bound (23) implies that penalty policy is likely to perform well when 
the incoming load p* 3> 1. Although the request sizes bi are not explicitly 
present, the bounds (22) and (23) impose an implicit upper bound on the 
biS via the capacity constraint J2ibiPiCei < b. 

We establish a lower bound on the expected reward rate E[.R(f)] of the 
policy TT by comparing it to a related infeasible policy ff. The policy fr is 
identical to tt except that it does not respect the system 0 capacity con¬ 
straints; that is, the policy tv routes an incoming class i request to system 0 
whenever 


(24) 


dxi ~ dyi 


where {s(t) = {x{t),y{t)) > 0} denotes the state process that corresponds 

to the policy tv. Since the various request classes interact only through the 
capacity constraints, the policy tv controls each class independently. 

We establish a bound on the total derivative (d/(if)E['k(s(f))], which im¬ 
plies that if the initial state y(0“) is suitably chosen, the penalty E['k(s(f))] 
is a uniformly bounded function of time. 
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Lemma 1. Suppose £ <\, are given by (21) and (5 satisfies (22). 

Then, for all i = 1,... ,m and t>0, 

±E[%{m)] < (1 - e)/ii(2e('-^/2)/3 _ E[^,(s,(i))]). 

Proof. Fix a request class i. Define [^j(sj(u))] = E['I'i(sj(u)) | iFt], 
u>t, where iFt is the filtration generated by events up to t. Then 

|E,|4..(s(())]=yH'i(s(i)), 

where A is the generator of the stochastic process {s{t) :t > 0}. Let 7fj(f) 
denote the routing decision of the policy tv at time t, that is, 


7ri(t) = 


1 , 

0 , 


dxi ~ dyi ’ 
otherwise. 


Then 


= Xi[{'ifi{xi + Tti{t),yi) - 'iffixi,yi)) 

+ {'ifi{xi,yi + (1 - TViit))) - 'i>i{xi,yi))] 

+ fii[xi{'ifi{xi - l,yi) - '^i{xi,yi)) 

+ yi{^iixi,yi - 1 ) - ^i{xi,yi))], 

where we have suppressed the time dependence of {xi,yi). From the Taylor 
series expansion, it follows that e^ <1 + x + x‘^ for all |x| <1 and from the 
bound (22) we have that max{/36i/c°,/36i/cl} <£. Therefore, 


A'ifi{s{t)) < (1 + e)/i 


d^i 


- (1 


*' dxi 

d^i 


^i{t)pi + 


d-^i 

dyi 


(1 - TTi{t))p, 




Since Tvfit) minimizes the increase in penalty, it follows that 

d'^i _ , . d'^i ~ d'^i , d'^i . 

for any xf + yf = pi, xf, yf > 0. In particular, choose 
(25) xl = afpi, yf = (1 - af)pi. 

Then, we have 


.A^'i(s(t)) < (1 + e)/ii 


d^i 

dxi 


4 + 


d-^i 

dyi 


yf 
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d^i({l + e) 


dvi 


■yi{t) 


(26) 


= (1 

< {l-e)ix. 


dxi V(1 -e) 

Y(l±£) e 
’’ (1-e) * 


, , , , d'^i / (1 + e) , , , 

Xi - a;i(t) + — Y - yi{t) 


dyi V(1 -e)‘ 


- ^i(s) 


(27) < (1 - e)//i[^i((l + 3e)sf) - 

where (26) follows from the convexity of 'I'j and (27) holds because < 1 + 

3e for all e < |. From (21) and (25), it follows that (l + 3e) max{5jxf/c°, 6j?/f/cj} 
< 1 — f • Consequently, 

^Ei[^,(s,(t))] < (1 - e)^42e('-^/2)/3 _ 

The result can now be concluded from the Lebesgue bounded convergence 
theorem by recognizing that for all sufficiently close s >t, (Et['I'j(sj(s))] — 
'^i{si(t)))/{s — t) can be bounded above by a fixed random variable. □ 


Lemma 2. Suppose £ <\, ( 0 *^, 0 ^) are given by (21), (3 satisfies (22) and 
the initial state Sj(0“) = (0,y(0“)) satisfies Tj(sj(0“)) < 2exp((l — e/2)/9), 
i = 1,...,m. Then, for all i = 1,... ,m and t > 0, 


(28) 


E[Ti(si(t))]<2e(^-^/2)/3_ 


Proof. Fix a request class i . Suppose the conclusion does not hold. De¬ 
fine fi{t) = E[Tj(sj(t))] and f* = 2exp((l — £f2)fi). Then Lemma 1 implies 

that ^ < (1 - £)t^i{f* - fi{t)). 

Let r be any time instant when /(r) > f*. Since f{t) is a continuous 
function of t and /(0“) < /*, there exists s < t such that /(s) = f* and 
f{t) > f* for all s <t <T. By construction, /(r) > f* = f{s), but by the 
fundamental theorem of calculus, we have 

fiT) - fis) = [ du< [ (1 - £)yi{f* - f{u)) du < 0, 

J s du J s 

a contradiction. □ 


The bound (28) implies the following results. 

Lemma 3. Suppose e <\, (c*^,c^) are given by (21) and (3 satisfies (22). 

(i) Let w{t) = YfiiLihiXi{t) and suppose'l'j(sj(0“)) < 2exp((l — e/2)/9), 
i = l,... ,m. Then 

(29) E[{a.(«)-(,)+l<(l+4e).lsY-.6. 
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(ii) Suppose yi{0 ) = (1 — i = 1,..., m. Then the reward rate R{t) 

of the policy tt satisfies 

m m 

(30) E[^(t)] > ^ alripfil - 

i=l i=l 


where ol^ is an optimal solution of the perturbed LP (20) and 



|)(l + 4.)-l. 


Proof. Let V) = {w{t) = h^iit) > b}. Then 


(31) exp( ^ • E[(u;(t) - 6)+] ) < E 


exp( ^{w{t)-by 


(32) 


= P(P,‘=) + E 

< 1 + E 

= l + e-^ Y[ E 


exp( ^ • {w{t) -b)];Vt 


exp( ^ • {w{t)-b) 


exp( P 


l<i<m 

where (31) follows from Jensen’s inequality. Moreover, 

hiXiit) 


bixfit) 


E 


expl P 


= E 


< E 


biXi{t)\\'^i/’’ 
exp( ^-o~ 


biXi{t)\'^\‘^i/’^ 
exp(/3-—^ 


(33) 

(34) 

(35) 

where (33) follows from Jensen’s inequality applied to the concave function 
x°‘, a < 1; (34) holds because x°‘ is monotonically increasing for a > 0 and 
(35) follows from (28). From (32) and (35), we have 


< (2e(^-^/2)/3^cO 


exp 


,E[(u}(t)-6)+]]<l + e-^ n (2e(i-^/2)/')^°/' 

^ l<i<m 


(36) < l + e“^(2e(^“=/2)^) 

< 1 + 2e-(^/2)/5, 

where (36) follows from the bound = (1 + 4e)< (1 + 

4e)(Tq^) = h. Part (i) follows by taking logarithms. 
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A similar argument establishes that 

exp(^/3 • < 2e(i-^/2)^. 

Therefore, 

(37) E[S(()1 < (1^ + 1 - I) I < (1 + 0(1 - 

where 

Let qi{t) denote the number of class i requests at time t in an inhnite 
capacity system with no admission control and let yf (t) denote the number 
of requests surviving from the yi(0“) class i requests initially loaded into 
system 1. Then conservation implies 

(38) qi{t) + y°{t)=Xi{t) + yi{t), 


where = denotes equality in distribution. [Note that the surviving requests 
y)'(t) are also counted as part of yi{t).] Suppose the initial load yi{0~) = 
(1 — af)pi, i = 1,... ,m. Then 


hVijO ) ^ 1 < 1 _ £ 

Ci 1 + 4e ~ 2 


Vi = 1,...,m; 


that is, the hypothesis of Lemma 2 holds for all i = 1,..., m. Therefore, (37) 
and (38) imply that 


(39) > pi{l - exp{-pi{t))) + (1 - af) ex.p{-pit) - (1 + C)(l - al)pi 

= (4pi{l - ex.p{-pit)) - C(1 - aj)pi. 


Thus, 

mm m 

(40) E[fl(t)] = ^r.E[i.{t)]>^af ripi{l - exp{-pit)) - C X!(l “ 

i=l 2 = 1 2=1 

□ 


Lemma 3 establishes that if /3 3> 1 is admissible, the policy tt does not 
signihcantly violate the capacity constraint and the associated reward rate 
E[i?(t)] is close to the upper bound (15). The following result establishes 
that, on average, the policy ir admits more requests than tt. 
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Lemma 4. Fix e, (5, (0*^,0^) and the initial state y(0 ) =y(0 ). Let tt 
and n be the policies that correspond to these parameters. Then 

d d 

Xi { t )< Xi { t ), yi { t )< yi { t ), i = l,...,m, 

d 

where X <Y denotes that, for all u>0, we have P(X > u) < P(y > u). 

Proof. The result is established by a coupling argument that employs 
another infeasible policy tt as a comparison policy. 

The policies ir, tt and tt act on the same labeled Poisson arrival streams. 
Let the fcth class i arrival be labeled {i,k). Let Xi{t) [resp. Xi{t)] denote 
the set of labels of all class i requests routed to system 0 by policy tt (resp. 
tt) and still in service at time t. 

The routing decision of the comparison policy tt is identical to that of 
the policy tt unless policy tt routes to system 1 (i.e., rejects) but policy 
TT routes the arrival to system 0 (i.e., accepts). Let t be any time instant 
when this event occurs and suppose the arriving request has the label (i, k). 
Since the policy tt does not face any capacity constraints, it must be that 
Xi{t~) > Xi{t~), that is, there exists a request with label {i,l) G Xi{t)\Xi{t). 
The policy tt admits the incoming request (i, k) into system 0 by relabeling 
it (i,Z) and moves the job previously labeled {i,l) to system i and relabels 
it {i,k). Clearly the policy tt is infeasible since the requests once routed to 
system 0 cannot be removed. 

From the definition of the policy tt it is clear that Xi { t ) > Xi { t ) and 
yi{t) < yi{t)- Notice that every time the policy tt removes a request before 
completion, the remaining service duration is exp(/rj), that is, the service 
duration of the request that replaces the removed request is, in distribution, 
identical to the remaining service duration. Therefore, the performance of 
the policy tt is, in distribution, identical to the policy tt. Thus, for all u > 0, 
we have 


P(5i(i) >u) = P{xi{t) >u)> P{xi{t) > u), 

P{yi{t) >u) = P{yi{t) >u)< P{yi{t) > u). □ 


Let S,i{t) [resp. ?7i(t)] denote the number of class i requests in system 1 
at time t that were rejected by the penalty function (resp. the capacity 
constraint). The expected value E[,^j(t)] is bounded as follows. 


Jo \ oxi dyi ) 


= / A,P 


Xiiu) yi{u) 


1 


1 

fJbi \cl 
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(41) 


(42) 


< 


XiP 


i.(u) 1 


Ci cj ' f3bi 


= / A,P 


= P[yi{t)], 


dxi dui 


du 


where (41) follows from {xi{u))/c^ — {yi{u))/c\ < (xj(rt))/c° — {yi{u))/c\. 
The expected value E[77j(f)] is bounded as follows: 


P[rii{t)]< I XiPi'^Xi{u)>b-bi 


. 2=1 
' m 


(43) 




(44) 

(45) 


< XiP yYlxi{u) > b - bi 

< /* exp(-/ri(t - u)) du 

Jo 

< 2e-/3(i-fe»/&)e/3(i-"/2) [' Ai exp(-/Xi(f - u)) du 

Jo 


< 2Pie 


-£/2(/3-4)/. _ 


(1 -exp(-/ijt)), 


where (43) follows from Lemma 4, (44) follows from an argument similar to 
that in the proof of part (i) of Lemma 3 and (45) follows from the bound 
on bi implied by (23). From (42) and (45) it follows that 

B[xi{t)] = E[gi(t)] + E[yo,i(i)] - P[yi{t)] 

= E[gi(t)] + B[yoAt)] - (E[e.(i)] + Phit)]) 

(46) > E[gi(t)] + E[yo,i(t)] - P[yi{t)] - (1 - exp{-pit)) 

= E[xj(t)] - 2pje“^/^(^“'^)(l - exp(-//it)) 

> afpj(l - exp{-pit)) - C(1 - af) - 2pie“^/^(^“'‘)(l - exp{-pit)), 

where (46) follows from the bound (39) and C = (+ 1 — |)(1 + 4e) — 1. 
Thus, we have the following result. 

Theorem 2. Suppose e<j, (c°,c^) are given by (21), [3 satisfies (22) 
and the initial state s(0“) = (0,y(0“)), with yi{0~) = {l — af)pi, i = 1,... ,m. 
Then the reward rate R{t) of the penalty policy it satisfies 


E[ii’(t)] > max< ^afri/9i(l - exp{-pit)) - “ (^l)riPi 


(47) 


. 2=1 


2=1 
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_2e -exp(-//ii)),0 L 


i=l 


where is an optimal solution of the perturbed LP (20) and 

Let L{t) denote the lower bound in (47). Then (15) and (47) imply that 
limt^oo L{t) 

(48) 


R* 


> 


E™ 1 - CEIli(l - of)riP^ - 2e--/2(/?-4) YZi P 


Pi 


R* 


Recall that (n*,v*) denotes an optimal solution of dual LP (9). From the 
duality theory for LPs it follows that (n*, v*) is optimal for the dual of the 
perturbed LP (20) for all sufficiently small e [Luenberger (1984)], that is, 

(49) eo = max{e: (u*, v*) is optimal for the dual of (20)} > 0. 

Thus, for all e <So, 

m m *L 

'^a|rip^ = '^v■ +- 
2=1 


(50) 


2=1 


+ 4e 


4e 




Since C < 8e + ( 4 g^ a,nd (50) imply the following. 


Corollary 1. Suppose e < minjeo, where eo is given by (49), ( 0 *^, 0 ^) are 
given by (21), (3 satisfies (22) and yi{0~) = (1 — af)pi, i = 1,... ,m. Then 
L = limi_>oo L{t) satisfies 


(51) 


L 


>1-I2e 


21og(2) 


/? 


+ 8e + 


2\og{2) \ YT=inpi 

(5 ) R* 


The term in (51) would appear, at first glance, to be large. 

However, recall that we had dropped from consideration all classes with a* = 
0; therefore, = Sp: a*> 0 } i®; total incoming revenue 

rate of only the admitted classes. 

Since e and (5 cannot be chosen independently, the lower bound (51) im¬ 
plies that for every given load p there is an optimal e*{p) and a corresponding 
optimal lower bound L*{p). The bound L*{p)/R* ^ 1 as /?} 00 , that is, the 
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penalty policy is optimal in the Halfin-Whitt limiting regime. This limiting 
result is further discussed in Section 3.3. 

Next, we numerically compare the transient performance of the penalty 
policy TT with the upper bound (15) for a three-class admission control prob¬ 
lem dehned by 


(52) 






/ 1 . 00 \ / 0 . 10 \ 

0.25 , b= 0.15 , 6= 100. 

VO.75/ VO.55/ 


The optimal solution of the corresponding steady state LP (6) is a* = 

[1,1,0.7818]^ and the optimal steady state reward R* = 207.2727. The ap¬ 
proximation parameter e was chosen by setting [5 equal to the upper bound (23) 
and optimizing the bound (51) as a function of e. The row marked Scale 
r/ = 1 in Table 1 displays the optimal e, and the steady state and transient 
error of the optimized penalty policy. Since the lower bound L{t) = 0 for 
all sufficiently small t [i.e., error 1 — {{L{t))/{R*(t))) is 100%], we dehned 
transient error = max{(L(f))/i?* :t > 0.1//rmin}- 

These numerical computations were repeated for the scaled admission 
control problem dehned by = kX, and = ^b. The corre¬ 

sponding results are shown in the row marked Scale r] = k in Table 1. 

From the numerical results, it is clear that as the load p t oo, both the 
steady state and the transient error improve. Although the steady state 
error appears to converge to zero, the transient error appears to level off at 


Table 1 

Comparison of bounds 


Scale rj 

Optimal e 

Error (%) 

Steady state 

Transient 

1 

0.2500 

51.3195 

88.6202 

2 

0.2500 

21.8708 

61.7278 

4 

0.1838 

17.1644 

48.7918 

8 

0.1422 

12.7112 

39.3613 

16 

0.1100 

9.3599 

32.2373 

32 

0.0851 

6.8943 

26.9023 

64 

0.0659 

5.1143 

22.9311 

128 

0.0437 

4.0341 

19.2897 

256 

0.0338 

2.8049 

17.0118 

512 

0.0236 

2.1991 

15.2632 

1024 

0.0183 

1.4909 

14.1900 
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approximately 15%. We believe that this is a consequence of the fact that 
the “target” (c*^,c^) is fixed instead of time-varying. 

Regressing the scale r] on the steady state error L, we obtain that 

(53) t/ = 4157.1L-2-^i°^ 

This power law paints quite a dismal picture: for steady state performance 
within 1% of the upper bound, the load p = O(IO^). Thus, the lower bound (51) 
suggests that the penalty policy is impractical for all but a small fraction 
of admission control applications. Fortunately, simulations (see Section 3.4) 
reassure us that the lower bound is quite weak and, in fact, the performance 
of the penalty is close to the upper bound even for moderate loads. 

The numerical comparison of the bounds for a specific example is certainly 
not as conclusive and convincing as an analytical comparison. Nevertheless, 
we believe that the insights derived from this simple example would survive 
analytical scrutiny. 

3.3. Limiting regimes. In this section, we investigate the performance of 
the policy tt in the Halfin-Whitt limiting regime [Halfin and Whitt (1981)]. 
The regime of interest here is defined in terms of a scale parameter n and 


the limiting regime is obtained as 

system capacity 

n 1 oo. In the nth system, 

= b, 

class i arrival rate 

= nXi, 

z = 1,..., m, 

(54) class i service rate 


z = 1,..., m. 

request size 

II 

z = 1,..., m. 

reward rate 

II 

z = 1,..., m. 


Note that the service rates remain constant, that is, the system exhibits 
transient behavior even in the limit. In the regime defined by (54) the in¬ 
coming workload the total reward rate of each request 

class i = 1,..., m are independent of the scale parameter n, whereas the in¬ 
dividual request size 6-"'^ and reward rate scales down. An equivalent 
regime is one in which the request size remains constant but the system 
capacity scales up. 

While it is plausible that appropriately thinning the incoming requests is 
a steady state optimal policy in the limit [Kelly (1991)], it is unlikely that 
thinning will perform well in the transient period. We show that the penalty 
policy TT is able to control transient behavior without sacrificing steady state 
performance. 
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We will need some notation and preliminary results to enable us to state 
the main result of this section. Let be any feasible policy for the nth 
system. Since = hpi, for all i = 1,..., m, the upper bound in (15) is 

still valid, that is, 


< min-| - exp(-/rjt)), 


(55) 


2=1 

m 


^ripia*{l - exp{-iiit)) + u*bexp{-fimmt) 


2=1 


Duality theory for LP [Luenberger (1984)] guarantees that 

m m 

(56) alriPi{l - exp(-//it)) > Y D/Oi(l - exp(-//it)) - 0{e) 

2=1 2=1 

for all £ < EO) where Eq is given by (49). Fix e < min{Eo,|}. Set (c°,c^) 
using (21), set 

/3 = ^log0)+4 

and set 


yi(0 ) = (l-af)pi, i = l,...,m. 

Define 

(57) no(E) = min|n > 1: /3 = - log^-^ + 4 satisfies (23) |. 

Then, for all n > no(E), the bounds (56) and (47) imply that 

m 

(58) L{t) >Y^iP^^^i^ - 

2=1 

Let si”'i(t) = (xi"'i(t),yi"'i(t)) denote the state process and let denote 

the reward rate that corresponds to tt in the nth system. Then 
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where Df,Dy ); 1 = 1,..., m} are independent rate 1 Poisson pro¬ 

cesses, the departure rates (Oi ('))) i = 1, • ■ •, nr, are given by 


(60) 


M / \ 
%,i (s) = 


and the arrival rates {')Ay^i ('))) i = ■ ,m, are given by 


dxi ' % 
, 0, otherwise, 


nXi 




(61) 


<^ and 'S^bjXj(t)-\ — bi<b, 

rin- < ^ J J ^ 

j = l 


M ( \ 

(s) = 


n\i 


d^i 

> 


or 


dxi dyi 


1 


n 


. 0, otherwise. 

Fix time t and define Then 


(62) 


^n = E 


in)-{n) 
i 

i=l i=l 


r- 'x- ' < E r- 


{n)f b \_ 


S'^) 


=<-!:■ 


. i=l 


From the dynamics (59) it follows that 


var(X„) = E(' 




2=1 


var 


(63) 


+ var Df 




2=1 




The upper bounds (62) and (63) imply that the family of random variables 
{Xn : n > 1} is tight and all its limit points are nonrandom. 

To show that the sequences {Xn : n > 1} have a limit, we need new no¬ 
tation. Let X{{ denote the reward rate at time t when the policy tt is em- 

(p) 

ployed in an admission control problem where the arrival rates X] ^ = pXi , 
i = 1,... ,m, the capacity is qb and the individual rewards are unsealed. 
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Then Xn = ^X^ and, for all n>m, one has the inequality 

(64) E[X„] = IeK] > IeK] 

1 771 

(65) >-Era = -E[X^]. 

Intuitively, inequality (64) follows from the fact that the expected reward 
rate is a nondecreasing function of capacity, and (65) follows from the fact 
that, since no costs are incurred for rejecting customers, the expected re¬ 
ward is a nondecreasing function of the arrival rate. A formal proof of this 
statement requires a coupling argument very similar to that in Lemma 4. 

Let 7 i, i = 1,2, denote two distinct limit points of the sequence {Xn : n > 
1} and choose subsequences Xn^ —> 7 i and Xm^ —> 72 - From (62), we have 
E[X„j,] —> 7 i and E[Xmj,] —> 72 - By possibly choosing subsequences, ensure 
that rufc + y/rrik > Then (65) implies that 71 > 72 . Since the order 

of the 7 j was arbitrary, it follows that 71 = 72 , that is, lim^^oo Ai^ = X, 
where X is nonrandom. Thus, we have the following result. 

Theorem 3. Suppose e < min{eo, \}, where eo is given by (49), (c°,c^) 
are given by (21), /? = | log(|) + 4 and yi{0~) = (1 — af)pi, i = 1,... ,m. 
Let denote the reward rate of the policy tt in the nth system. Then 

R°°{t) = liuin^oo R^^Ht) exists a.s. and is nonrandom. Moreover, 

m 

(66) R^°°\t) > ^riPia*(l -exp(-/rit)) - 0{e), 

i=l 

where a* is an optimal solution of the LP (6). 

Since the control is a discontinuous function of the state, we cannot assert 
that the process {R^'^\t):t G [0,T]} converges to the process {R°°{t):t G 

[o,r]}. 

3 . 4 . Numerical experiments. In this section we report the results of some 
preliminary simulation studies of the penalty policy. The objectives of these 
simulation experiments were to investigate the following; 

(i) The quality of the lower bound (47): The numerical computations 

in Section 3.2 imply that pi = for the penalty policy to be able 

achieve a steady state error of order L. If the lower bound were tight, this 
would imply that the penalty policy is impractical for all but a fraction of ad¬ 
mission control applications. We compared the lower bound with simulated 
performance to evaluate the quality of the bound. 

(ii) Comparison with the thinning policy [Kelly (1991)]: We compared 
the performance of the penalty and thinning policies in reward maximization 
and load balancing scenarios. 
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3.4.1. Comparison with bounds. We arbitrarily chose the following three 
scenarios. 


Scenario 1. 



Scenario 2. 



Scenario 3. 


(69) 




8 


6 

5 



/ 1 


0.25 

0.75 

VO.67/ 



/ 0.1 \ 

b= 0.015 , 6 = 1. 

V 0.055/ 



/ 0.01 \ 

b= 0.015 , 6 = 1. 

V 0.055/ 


/0.5\ 


2 


0.3 


V0.2y 


/ 0.02 


0.015 

0.055 

V 0.045/ 


For each of the scenarios, the optimal solution o:* and the maximum reward 
R* are determined by solving the LP (6). The approximation parameter 
e was set to the value that minimized the steady state error (51) and f5 
was set equal to the bound (23). The performance of the penalty policy 
was simulated over the period [0,tmax = 10//imin] and the reward rates were 
averaged over p = 100 independent simulation runs. The simulation was 
repeated for scaled systems (A*'”'^ = nX, ^ bC"') = ib, = ^r) n = 

10,100,1000 (see Section 3.3 for details). 

Figures 1-3 compare the simulation estimates with the upper bound (15) 
and the lower bound (47) for the three scenarios. In the plots, the reward 
rate is normalized by R* and time is in units of l/pram- 
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scale n s 100 scale n — 1000 




Fig. 1. Comparison with bounds: Scenario 1. 
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Fig. 2. Comparison with bounds: Scenario 2. 
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From the plots, it is obvious that the lower bound is quite weak, partic¬ 
ularly so for small values of the scale parameter n. The performance of the 
penalty policy is, in fact, quite close to the upper bound. Although the tran¬ 
sient performance of the penalty policy is significantly superior to the lower 
bound, it is clear that there remains a gap that needs to be bridged. Com¬ 
paring the plots for different scales n, we see that the performance of the 
penalty policy is not very sensitive to the scale parameter n. In summary, 
the performance of the penalty policy, even for small loads, is remarkably 
good. 

3.4.2. Comparison in reward maximization scenarios. The thinning pol¬ 
icy is dehned as follows [Kelly (1991)]. Let a* denote an optimal solution 
of the steady state LP (6). The thinning policy admits an arriving class i 
request with probability a *, provided there is adequate capacity to serve the 
request. 

Figures 4-6 plot the average performance of the penalty policy and the 
thinning policy as a function of the scale parameter n for the three scenarios. 
As before, the performance was simulated over the period [0, tmax = 10//imin] 
and reward rates averaged over p = 100 independent simulation runs. In 
these simulation experiments both the penalty policy and the thinning policy 
saw the same sample path of Poisson arrivals. Also, a request accepted by 
both policies had the same service time in both cases. 

The simulation results suggest the following conclusions. The variance of 
the reward rate of the thinning policy is significantly larger than the variance 
of the reward rate of the penalty policy. This is particularly the case for small 
loads. As the load increases, the steady state behavior of the thinning and 
penalty policies converges; however, the penalty policy remains significantly 
superior in the transient period. 

3.4.3. Comparison with thinning in load balancing scenarios. The objec¬ 
tive here is to maintain the load of the various classes close to a prescribed 
fraction f, that is, class i load has to be maintained close to bfi, i = 1,... ,m. 
We considered the following two scenarios: 

Scenario 4. 



( 70 ) 


6 = 100 . 
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Fig. 3. Comparison with bounds: Scenario 3. 
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Fig. 4. Comparison with thinning policy: Scenario 1. 
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Fig. 5. Comparison with thinning policy: Scenario 2. 
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Fig. 6. Comparison with thinning policy: Scenario 3. 
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Scenario 5. 



The two scenarios differ only in the fact that in Scenario 4, = /r 2 , whereas 

in Scenario 5, ^2 = 10/ii. 

The load balancing is achieved via an appropriate admission control pol¬ 
icy. Suppose a fraction of all incoming class i requests is admitted into the 
system. Then the steady state class i load is hiPiai. Thus, if ctj = bfi/bipi, 
then the steady state class i load will be fib. In this set of simulation exper¬ 
iments, we compared the performance of the thinning and penalty policies 
constructed from the computed admission ratio ct. 

The results for the two scenarios are shown in Figures 7 and 8. The top 
plot corresponds to the penalty policy and the bottom plot corresponds 
to the thinning policy. In both plots, the x-axis is time (here time is not 
normalized) and the y-axis is the fraction of the resource utilized by the 
requests. As before, the results are averaged over p = 100 iterations. 

In steady state, the performance of the thinning and penalty policies is 
almost identical. However, the transient performance of the penalty policy 
is significantly superior to that of the thinning policy: In Scenario 5, where 
Pi 7 ^P 2 , the resource sharing that corresponds to the penalty reaches steady 
state levels at t = 0.2 = whereas the resource sharing associated with 

the thinning policy does not reach steady state levels even by t = 2 = 20/rmin- 

This example illustrates the target-tracking nature of the penalty policy. 
The policy merely tracks the target set by the capacities (c*^,c^). It is ap¬ 
proximately optimal in the revenue maximization scenario because the LP 
sets an appropriate target to track. It could just as easily track a target set 
by other considerations. 

3.5. General service times. In this section, we assume that the service 
duration Si has a general distribution with mean z = l,...,m. Let pi 
denote the density and let Gi denote the cumulative distribution function 
(CDF) of the service duration Si, i = l,... ,m. 

Since the steady state LP (6) and its dual (9) depend only on the mean 
service time pi, they still remain the same. As before, let R* denote the 
optimal value, let a* denote an optimal solution of the primal LP (6) and 
let (u*,v*) denote an optimal solution of the dual LP (9). 

Let qi{t) denote the number of active class i requests at time t in an 
infinite capacity system service time Si ~ pi and no admission control. It is 
well known that [see, e.g., Wolff (1989)] 

(72) E[g,(t)]=p,(l-Gf(t)), 
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Performance of penalty policy 



Performance of thinning policy 



Fig. 7. Comparison in load balancing: Scenario 4. 

Performance of penalty policy 



Performance of thinning policy 



Fig. 8. Comparison in load balancing: Scenario 5. 
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where G^{t) is the tail of the equilibrium CDF of the class i service distri¬ 
bution. Thus, Gf(t) plays the role of the tail exp{—fj,it) of the exponential 
service time distribution. This observation leads to the following extension 
of Theorem 1. 


Theorem 4. The reward rate K^{t) of any feasible policy vr satisfies 




(73) 


^ 2=1 
m 


- G'iit)) + u*b( max G|(t) 
1=1 ^ 


where a* is an optimal solution of (6), (n*, v*) is an optimal solution of (9) 
and Gf{-) is the tail of the equilibrium CDF of the class i service duration, 
i = 1,..., m. 


Note that 

( m \ m 

max Gl{t)\ ='^ripia* = R*, 

1=1 / 1=1 

that is, the steady state reward rate of any admissible policy is bounded 
above by the optimal valne of the steady state LP (6). 

Remark 2. Note that in evaluating the upper bound (74), we use only 
the fact that the policy tt is feasible and use the bounds on the population 
of an M/G/oo quene [see, e.g., Wolff (1989)]. 


Next, we characterize the performance of the penalty policy tt in this 
model. Recall that admission decisions of the policy tt depend only on the 
load of requests of each class that have been assigned to the original system 
and the fictitious infinite capacity system. In particular, the policy does not 
keep track of the remaining service times of the requests in the system. 

Let gl and G\ denote, respectively, the density and the CDF of the re¬ 
maining service time of a class i request conditioned on the fact that it has 
been in service for t time units. Then the tail 


(74) 

and, therefore, 

(75) 


G\{s) — 1 - G\{s) 


Gjjt -kg) - Gf{s) 

Gm 


dG\{s) _ gf{s)-gf{t + s) 
ds Gfit) 


We make the following assumption abont the rate function gl{0). 
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Assumption 1. The function gj{0) is a decreasing function of t for all 
i = 1,... ,m, that is, gj^O) > lim„_>oo 5 “( 0 ) = (0) = /r* for alH = 1,...,m. 

Remark 3. The exponential distribution satisfies this assumption as 
does the heavy-tailed CDF G{s) = (1 — (1/(1 + •s)^))l{s > 0}. 

Under Assnmption 1, we have the following analog of Theorem 2. 

Theorem 5. Suppose e<^, (c°,c^) are given by (21), (5 satisfies (22) 
and yi{0~) = {1 — af)pi, i = 1,... ,m. Suppose also that Assumption 1 holds. 
Then the reward rate R{t) of the penalty policy satisfies 

m m 

E[R(t)] >'^ripialil - Gf(t)) - ^ripfil - af)(G|(t) - Gfit)) 

i=l i=l 

m m 

(76) - C - otlYiPi - ^ Vipfil - 

i=l i=l 

where ol^ is an optimal solution of the perturbed LP (20) and 

Remark 4. Unlike the lower bound (47), the bonnd (76) has a term 
Y^fLi riPi{l — af){Gf{t) — Gi{t)) that does not vanish as e —> 0, that is, no 
matter how small the request size, this error cannot be surmounted. This 
term appears because the policy tv does not account for the remaining service 
times of the requests in the system. 

4. Extension to loss networks. In this section, we extend the results of 
Section 3 to the network model introduced in Section 2. Recall that the 
stochastic system under consideration consists of a network of s resources 
with capacity b e R^, where b{k) is the capacity of resource k = 1,..., s, and 
the system is initially empty. Requests for using this network of resources 
belong to m Poisson arrival classes. Class i requests have an arrival rate A* 
and a service duration Si ~ exp(/rj). They are willing to accept any capacity 
allocation from the set Bi = {Ri,... ,bj;.}, b^- e R^, and pay per nnit 
time for the period the reqnest is in the system. 

4.1. Upper bound on expected reward rate. Let tv be any feasible control 
policy for the stochastic problem. Let xfj(t) denote the number of class 
i requests in the system at time t that were assigned the capacity vector 
b^j G Bi. 
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The analog of (4) for the network setting is given by 


(77) 


maximize 

i=l \j=l / 
m / k \ 

subject to E I — 

i=i \j=i / 




i=i 

oiij >0, j = 1,... ,/i, i = 1,... ,m. 

Let R* {t) denote the optimal value of this LP. Taking the limit t —> oo in 
(77), we get the steady state LP 


a. 




(78) 


m / k 

maximize E^i^n E 

i=i \j=i / 

m / k \ 

subject to E Pi I E/ ^b'^b j — 

i=i \j=i / 

k 


j=i 

aij >0, j = i = 

Let a* = (a*j){j=i,,,,,ii,i=i,...,m} denote an optimal solution and let R* denote 
the optimal value of (78). The dual of the steady state LP is given by 

minimize b^u + l^v 

(79) subject to pm < Vi + piU^hij, j = 1,... ,li, i = 1,... ,m, 

v>0, u>0. 

Let (u*,v*) denote an optimal solution of the dual LP (79). Then we have 
the following extension of Theorem 1. 

Theorem 6. The reward rate R'^{t) of any feasible policy n satisfies 
E[i?’^(t)] <R*{t) 


( 80 ) 


< min-^ '^ripi{l - exp{-pit)), 

U=i 

m " 

^ripia*(l - exp{-pit)) + (u*)^bexp(-//mint) 


2=1 
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where a* = oiiji i = ■ ,^ 1 , a* is an optimal solution of steady state 

LP (78) and (u*,v*) is an optimal solution of steady state dual LP (79). 


4.2. Penalty function and £-feasible control policy. As in the single-resource 
case, we drop from considerations all those capacity vectors hij which have 
the corresponding a*j = 0 and augment the network of systems by adding 
one additional fictitious infinite capacity system. The state s(t) of the aug¬ 
mented network is given by 

(81) s{t) = (xi(t),...,Xm(t),y(t)). 


The state vector 


(82) 


Xi(t) = {xii{t),...,xii^{t)) e 


describes the accepted requests, where Xij{t) is the number of active class 
i request that have been assigned to hij € Bi. The state vector y(t) = 
{yi{t),... ,ym{t)) G Z™, where yi{t) is the number of class i requests in the 
fictitious system. 

The penalty function T(s) is given by 


(83) 


^(s)=E 


2=1 


2^ exp /3 • - 


k=l 


C, 


,0 


ik 


’l'ife(xi) 


-Lexp(^/3- 


where f3, (cl, {c°^}|=i), i = 1,...,m, are appropriately chosen constants. Let 
Sj = {xi,yi) denote the components of the state vector that correspond to 
class i, let C*’ G (jg^ote the matrix [c?^] and let G R™ denote the 

vector (c^...,c)„)^. 

The penalty policy tt for a loss network is defined as follows. Let s(t) = 
(xi,... ,Xm(t), y(t)) denote the stochastic state process that corresponds 
to the policy tt and let Sj = (xj,yj). At time t = 0~, the policy loads the 
infinite capacity system to the level y(0“). An incoming class i request is 
conditionally accepted if 

. [ ^ d^ik \ ^ d^i 

i<j<h dxij j dyi 


A conditionally accepted request is accepted and assigned to h^j G Bi pro¬ 
vided 


j G arg min( 


d^ik \ 
dxiji J 


and there is adequate capacity [i.e., ^i'j'Xi'jft) -|- hij < b]. Oth¬ 

erwise the request is routed to the fictitious system and is assigned a service 
duration Si ~ exp(//j) that is independent of everything else. 
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As in the case of the single-resource problem discussed in Section 3, the 
capacities (C®,c^) determine the following perturbed version of the steady 
state LP (78): 

maximize I I 

i=l \j=i / 


(84) 


m / li 

subject to 

i=l Vi=l 

k 
i=i 


< 


1 


l + 4e 


b, 


aij >0, j = 1,... ,li, i = l,... ,m. 

Let = {afj : j = 1,..., Z*, z = 1,..., m} denote an optimal solution of (84). 
The capacities (0*^,0^) are given by 


(85) 


c} = (1 + 4e) ^ z = 1,..., m, 

/ k \ 


Cik = (1 + 4e)t'fc aljbij{k)j pi, A: = 1,..., s, z = 1,..., m, 


where I'k is given by 


( 86 ) 


= 


(1/(1 + 4e))6fc 


■^771 

ji = l 

The parameter (5 must satisfy the bound 


Y.T=lY.Uocl^p^hm 


Zc = 1,..., s. 


(87) /3<emin| min {cl}! 

t {(2,A:) : l<2<m5l</!:<s} 1 ^ 62J (fc) J J 


where U^ = {i: Z)jLi < 1, z = 1, • • •, zn}. 

A simple extensions of the techniques developed in Section 3 allows one 
to establish the following analog of Theorem 2. 


Theorem 7. Suppose e< (C°,c^) are given by (85), /3 satisfies (87) 
and yi{0~) = (1 — a^Pi, i = Then the reward rate R{t) of the 

penalty policy vf satisfies 

m m 

E[.R(i)] > alripfil - exp(-/Zit)) - C “ (4)nPi 

i=l i=l 

m 

- (s + l)2e-=/2(/3-T Y ViPiil - exp(-/Zii)), 
i=l 


(88) 
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where af = i = ^-, ■ ■ ■ iTn, ol'^ is an optimal solution of the perturbed 

LP (84) and 




5. Extension to general polytopic constraints. In this section we gener¬ 
alize the penalty approach for admission control to a related problem of state 
control. Although we discuss this problem in the context of a single-resource 
model, the results easily extend to networks. 

The stochastic model is similar to that in Section 3. Requests belong to 
m Poisson arrival classes. Class i requests have arrival rate Aj and service 
duration Si ~ exp(/ri). All the requests arrive at a common infinite capacity 
system. 

Let x(t) = (xi(t),... ,Xm{t)) C R+ denote the number of requests of each 
class in the system at time t. If no control is exercized, then the expected 
number E[xj(t)] of class i requests evolves according to E[xj(t)] = pi{l — 
i = ,,, ^rn. Therefore, the expected steady state load is p, where 

p = (pi,...,Pm)^ER!p. 

Let S C ni<i<m[0)Pi] be a polytope defined as 

(89) 5 = {x: 0 < X < p,Dx < h}, 

where D E and h E R.^ . We assume, without loss of generality, that 

h > 0. We also assume that the interior int(5) ^ 0; that is, there exists 
X E 5 such that Dx < d. In this section the objective is to construct an 
admission control policy that ensures that x(t) E S with high probability. 

Define the “lifted” set 


(90) 5 = {(x,y) :0 < X < p, 0 < y < p, D+x -L D y < h-L D p}, 

where D+ E with DA = max{Djj,0} and D~ E with Dt = 

max{—Djj,0}. It is clear that x E 5 implies (x, p — x) E 5. The “lifting” of 
the state space introduces a state space expansion that is mimicked by the 
control policy by adding a fictitious system to the network. 

Define (x*,y*) E 5 as 

f d)" X -|- d7y j 

(91) (x*,y*) =arg min max 

(x,y)es hj + dj p J 


where dA (resp. d^- ) is the jth row of (resp. D ). Define 


( 92 ) 


max 

l<j<s 


d+X* + djy* 

hj +djp 


min _ max 
(x,y)e5 


d+x + d^.y ) 

hj -L d”p j 
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and 


(93) 


\I/* = ^ 


(1 + 3e)/i] 


'max 


Y.y-)). 


Claim 1. The violation 7* < 1. 

Proof. By assumption, there exists x€ S such that Dx < d, that is, 
(d+ — dj)x < hj y j = I,... ,s or, equivalently, (d+ x + d“(p — x))/{hj + 
d“p) <1 Vj = l,...,s. The result follows from the fact that x G 5 implies 


(x,p 


cS. 

□ 


G 


The quantity 7 * is a measure of the size of the set S: the smaller is the 
value of 7 *, the larger is the size of the set S. 

Assumption 2. The ratio of = mini<j<m{pi} to p^ax = maxi<j<m{p 
is bounded below by 7 *, (i.e., Pmin/Pmax > 7 *)- 

This assumption essentially requires that the size of the target set S be 
comparable to the rate mismatch. If the rate mismatch is large, then the 
target set S cannot be too small. In particular, if all the departure rates pi 
are identical, then Assumption 2 is always satished. All the results in this 
section assume that pi, i = 1,..., m, satisfy Assumption 2 . 

As in all the previous sections, we add one fictitious system that tracks 
the rejected requests. Let x(t) [resp. y{t)] denote the state of the original 
system (resp. fictitious system) at time t, and let s{t) = (x(t),y(t)). The 
control policy tt uses a penalty function to balance the loads of accepted 
and rejected customers to control the state of the system to lie in S. The 
penalty function T(s) is defined as 


(94) 



where the multiplier f3 satisfies 


(95) 


B <£{ min 



The policy tt accepts a class i request if 
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otherwise, the request is routed to the fictitious system and the policy it 
attaches to it a fictitious service time S ~ exp(^i) that is independent of 
everything else. 

We have the following analog of Lemma 2. 

Theorem 8 . Supposes <\,fi satisfies (95) and E['L(s(0))] < T*, where 
T* is given by (93). Suppose also that Assumption 2 holds. Then 

E[T(s(t))] <T* Vt>0. 


The following result establishes that the policy it ensures that the ex¬ 
pected value E[s(t)] of the corresponding state vector lies in an e-inflation 
of the target set S. 

Theorem 9. Suppose e <\, (5 satisfies (95) and the initial state y(0“) 
is chosen to ensure that 'I'((0,y(0“)) < T*, where T* is given by (93). Sup¬ 
pose also that Assumption 2 holds. Then, for all t>0, 

(96) djB[it{t)]<hj + Cd-p + d-e-^\p-y{0-)), j = l,. 

where 

Tog(s) 


, ,s, 


C = 




-|-3e I and M = diag(/rj). 


Proof. Repeated application of Jensen’s inequality implies 

d/x^) +d"y(t)'i' 


exp I B max E 
l<i<s 


hj -L d ■ p J 


(97) 


<exp^/3E max 

< E exp 1 B max 

L \ i<j<t 

<ET(s(t)) 

< T* 


(d+i(t) + d^. y(t)^ 


1 hj+df p 

1 ) 

fd+5t(t)+d-y(t)' 


1 hj + djp 

\)\ 


where (97) follows from the definition of 7 * in (92). Taking logarithms, we 
get 

d+E[i(t)] -L dTE[y (t)] < + 1 + 3e^ {hj + df p) 

< (1 + C}{hj + d^- p). 
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The result follows by recognizing that E[x(t)] + E[y(t)] = (I — e + 

g-Mty(Q-)^ where M = diag(pj). □ 


Theorem 9 leaves the choice of the initial loading of the fictitious systems 
y(0“) open. One possible choice for y(0“) is an optimal solution of the LP 


minimize max djM(p-y) 

1<J<S 

(98) 

subject to dj-y < {hj + dj j = 

where '!'* is given by (93). The LP (98) minimizes the tracking error subject 
to the constraint that T(0,y(0“)) < 'k*. 

Our objective in this section was to demonstrate a policy tt that ensures 
that the state x’^(t) G S with high probability. Since 0 < E[x] < (1 — 
Theorem 9 states that E[x(t)] lies in the set 

(99) ^^(t) = {x: 0 < X < p, Dx < h + C(h + D~p) + D“e“^*(p - y(0“))}, 


where ( = + 3e) and M = diag(p). Suppose the loads p are high 

enough such that /3 = satisfies (95). Then S^{t) is an e-blowup of the 
target set. 

One might be tempted to convert this expected value result into a sample- 
path result by using Markov’s inequality. However, such an attempt will be 
futile. The essential problem is that, although the policy tv is able to control 
the accepted load, the total load of class i requests is uncontrollable on a 
sample-path basis. Therefore, one can expect a sample-path result only if 
the total load is well behaved. The rest of this section investigates a limiting 
regime where this is the case. 

Consider the limiting regime defined by (54) in Section 3.3. Choose £ < \ 
and set /3 = ilog(s). Define 


( 100 ) 


no(e) 


_ P _ 

emmi<j<s{hj -h dj p] 


Then, for all n > no{£), the hypotheses of Theorems 8 and 9 are true and the 
corresponding bounds hold. Let (t) :t> 0} be the state process when 
the control policy tt is employed in the nth system. The results in Section 
3.3 imply that 

(101) s'^(t) = lim s(")(t) 

n—>oo 


exists and is nonrandom. The uniform bound on the penalty function \k(s(”) (t)) 
implies that the sequence (t): n > no(e)} is uniformly integrable; 
therefore, 

(102) (t) = E[s(°°) (t)] = lim E[s(")(t)], 

n—^oo 

leading to the following result. 
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Theorem 10. Fixe <\, j3>^\og{s) and y{Q ) such that )) < 
T*. Then, for all t>0, 

G 5£(i){x: 0 < X < p, Dx < h + 4e(h + D~p) 

(103) 

+ D-e-^*(p-y(0-))}, 

where M = diag(pj). 

A possible choice for y(0“) is an optimal solution of the LP (98). 

6. Concluding remarks. In this article, we combined several disparate 
research ideas—mathematical programming bounds [Bertsimas, Paschalidis 
and Tsitsiklis (1994), Gibbens and Kelly (1995), Bertsimas and Sethuraman 
(2002), Bertsimas and Nino Mora (1999b) and Bertsimas and Chryssikou 
(1999)], state- space expansion [Kamath, Palmon and Plotkin (1998)], ex¬ 
ponential penalty functions [Bienstock (2002)] and target tracking—to con¬ 
struct admission control policies. These penalty-based policies are approxi¬ 
mately optimal when the request are sufficiently granular, that is, when the 
resource requested by a single request is small compared to the total capac¬ 
ity. The policies perform well both in the transient period as well as in steady 
state. The steady state performance of the penalty policy is controlled by 
the target supplied by a linear program, while the transient performance 
is controlled by a fictitious system or, equivalently, by expanding the state 
space. The penalty-based policies are also able to track arbitrary polyhedral 
target sets. 

There are several issues that still remain open. From the numerical com¬ 
parison of the bounds in Section 3.2 and the simulation results in Section 3.4, 
it is clear that in the transient period there is a gap between the performance 
of the control policy and the upper bound on achievable performance. This 
gap is probably because the capacity of the fictitious systems is too high for 
the transient period and, as a result, a larger fraction of the arriving requests 
get rejected. Thus, a possible solution would be to dynamically adapt the 
capacity of the fictitious systems. While this approach appears to perform 
well in simulation, we do not have an analytical justification for it. Also, it 
is unsatisfying that in the Halfin-Whitt regime we are not able to prove the 
convergence of the process over compact intervals (see Section 3.3). While 
it appears that this ought to be the case, the discontinuity in the control 
makes such a result hard to establish. 

From the simulation results for the single-resource problem, it appears 
that all the benefits of the penalty policy are simply a consequence of the 
state space expansion that results from the addition of the fictitious sys¬ 
tems. Further simulation experiments are planned to test this hypothesis. 
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In any case, state space expansion is a new technique that is worth exploring 
further. 

In addition, there is always the issue of queuing. Building on the results 
developed here, Cosyn and Sigman (2004) [see also Cosyn (2003)] proposed 
penalty-based control policies for a finite capacity model that allows waiting 
and reneging. The extension to queuing networks with feedback is still open. 

There are also several unresolved issues at the theoretical level. Although 
the exponential function allows the proofs to go through, it is not clear if 
it is essential to the problem. Young (1995) showed that the exponential 
penalty approach for packing and covering problems [see, e.g.. Chapter 3 in 
Hochbaum (1996)] can be viewed as a derandomization approach, where, at 
every stage of the derandomization, one is picking a decision that minimizes 
a Hoeffding-type exponential bound on the probability of failure. Something 
similar might be at work here; that is, the admission control policy could 
be minimizing the worst case bound of leaving the target set. This inter¬ 
pretation opens the possibility that the penalty policy works because the 
exponential function is twisting the dynamics to make the worst sample 
paths most likely. 

Acknowledgment. The authors thank the anonymous referee for helpful 
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